Starr County
Truthful and Trustworthy IoT AI Agents via Immediate-Penalty Enforcement under Approximate VCG Mechanisms
Shao, Xun, Shimizu, Ryuuto, Liu, Zhi, Ota, Kaoru, Dong, Mianxiong
Abstract--The deployment of autonomous AI agents in Internet of Things (IoT) energy systems requires decision-making mechanisms that remain robust, efficient, and trustworthy under real-time constraints and imperfect monitoring. While reinforcement learning enables adaptive prosumer behaviors, ensuring economic consistency and preventing strategic manipulation remain open challenges, particularly when sensing noise or partial observability degrades the operator's ability to verify actions. This paper introduces a trust-enforcement framework for IoT energy trading that combines an α-approximate Vick-rey-Clarke-Groves (VCG) double auction with an immediate one-shot penalty. Unlike reputation-or history-based approaches, the proposed mechanism restores truthful reporting within a single round, even when allocation accuracy is approximate and monitoring is noisy. We theoretically characterize the incentive gap induced by approximation and derive a penalty threshold that guarantees truthful bidding under bounded sensing errors. T o evaluate learning-enabled prosumers, we embed the mechanism into a multi-agent reinforcement learning environment reflecting stochastic generation, dynamic loads, and heterogeneous trading opportunities. Experiments show that improved allocation accuracy consistently reduces deviation incentives, the required penalty matches analytical predictions, and learned bidding behaviors remain stable and interpretable despite imperfect monitoring. These results demonstrate that lightweight penalty designs can reliably align strategic IoT agents with socially efficient energy-trading outcomes. The rapid expansion of the Internet of Things (IoT) has created large-scale networks of heterogeneous sensors, distributed devices, and autonomous software agents that must jointly perceive, reason, and act in dynamic cyber-physical environments. X. Shao and R. Shimizu are with the Department of Electrical and Electronic Information Engineering, Toyohashi University of Technology, Toyohashi, Aichi 441-8580, Japan (e-mail: xun.shao@tut.jp).
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Texas > Starr County (0.04)
- North America > United States > Oklahoma (0.04)
- (4 more...)
- Energy > Power Industry (0.88)
- Information Technology > Smart Houses & Appliances (0.71)
- Banking & Finance > Trading (0.68)
Do Large Language Models (LLMs) Understand Chronology?
Wongchamcharoen, Pattaraphon Kenny, Glasserman, Paul
Large language models (LLMs) are increasingly used in finance and economics, where prompt-based attempts against look-ahead bias implicitly assume that models understand chronology. We test this fundamental question with a series of chronological ordering tasks with increasing complexities over facts the model already knows from pre-training. Our tasks cover (1) chronological ordering, (2) conditional sorting (filter, then order), and (3) anachronism detection. We evaluate GPT-4.1, Claude-3.7 Sonnet, with and without Extended Thinking (ET), and GPT-5 across multiple reasoning-effort settings. Across models, Exact match rate drops sharply as sequences lengthen even while rank correlations stay high as LLMs largely preserve local order but struggle to maintain a single globally consistent timeline. In conditional sorting, most failures stem from the filtering step rather than the ordering step, but GPT-5 and Claude-3.7 Sonnet with Extended Thinking outshine normal models significantly. Lastly, anachronism detection is found to be the easiest task for the LLMs but performance still declines with increasingly overlapping timelines or entities. Overall, our main contribution is showing that allocating explicit reasoning budget helps with chronological ordering with GPT-5 at medium/high reasoning effort achieving flawless ordering at all lengths and perfect conditional sorting (both self-filtered and given-subset), whereas low/minimal effort degrades with longer lists, mirroring earlier models. Our findings delineate limits of current LLMs on chronological tasks, providing insights into task complexity, and demonstrate scenarios in which reasoning helps. These patterns are important for the real-time application of LLMs in finance. We release all code and evaluation templates to support full reproducibility.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > Ohio (0.05)
- North America > United States > Massachusetts (0.04)
- (24 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.87)
- North America > United States > Texas > Starr County (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Woodlands County (0.04)
Sketch-Augmented Features Improve Learning Long-Range Dependencies in Graph Neural Networks
Hosseini, Ryien, Simini, Filippo, Vishwanath, Venkatram, Willett, Rebecca, Hoffmann, Henry
Graph Neural Networks learn on graph-structured data by iteratively aggregating local neighborhood information. While this local message passing paradigm imparts a powerful inductive bias and exploits graph sparsity, it also yields three key challenges: (i) oversquashing of long-range information, (ii) oversmoothing of node representations, and (iii) limited expressive power. In this work we inject randomized global embeddings of node features, which we term \textit{Sketched Random Features}, into standard GNNs, enabling them to efficiently capture long-range dependencies. The embeddings are unique, distance-sensitive, and topology-agnostic -- properties which we analytically and empirically show alleviate the aforementioned limitations when injected into GNNs. Experimental results on real-world graph learning tasks confirm that this strategy consistently improves performance over baseline GNNs, offering both a standalone solution and a complementary enhancement to existing techniques such as graph positional encodings. Our source code is available at \href{https://github.com/ryienh/sketched-random-features}{https://github.com/ryienh/sketched-random-features}.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Texas > Starr County (0.04)
- Asia > Middle East > Israel (0.04)
Agentic Temporal Graph of Reasoning with Multimodal Language Models: A Potential AI Aid to Healthcare
Healthcare and medicine are multimodal disciplines that deal with multimodal data for reasoning and diagnosing multiple diseases. Although some multimodal reasoning models have emerged for reasoning complex tasks in scientific domains, their applications in the healthcare domain remain limited and fall short in correct reasoning for diagnosis. To address the challenges of multimodal medical reasoning for correct diagnosis and assist the healthcare professionals, a novel temporal graph-based reasoning process modelled through a directed graph has been proposed in the current work. It helps in accommodating dynamic changes in reasons through backtracking, refining the reasoning content, and creating new or deleting existing reasons to reach the best recommendation or answer. Again, consideration of multimodal data at different time points can enable tracking and analysis of patient health and disease progression. Moreover, the proposed multi-agent temporal reasoning framework provides task distributions and a cross-validation mechanism to further enhance the accuracy of reasoning outputs. A few basic experiments and analysis results justify the novelty and practical utility of the proposed preliminary approach.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Texas > Starr County (0.04)
- Research Report (1.00)
- Overview (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.87)
- North America > United States > Texas > Starr County (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Woodlands County (0.04)
Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives
Zeng, Wei, Zhu, Hengshu, Qin, Chuan, Wu, Han, Cheng, Yihang, Zhang, Sirui, Jin, Xiaowei, Shen, Yinuo, Wang, Zhenxing, Zhong, Feimin, Xiong, Hui
The ongoing evolution of AI paradigms has propelled AI research into the agentic AI stage. Consequently, the focus of research has shifted from single agents and simple applications towards multi-agent autonomous decision-making and task collaboration in complex environments. As Large Language Models (LLMs) advance, their applications become more diverse and complex, leading to increasing situational and systemic risks. This has brought significant attention to value alignment for agentic AI systems, which aims to ensure that an agent's goals, preferences, and behaviors align with human values and societal norms. Addressing socio-governance demands through a Multi-level Value framework, this study comprehensively reviews value alignment in LLM-based multi-agent systems as the representative archetype of agentic AI systems. Our survey systematically examines three interconnected dimensions: First, value principles are structured via a top-down hierarchy across macro, meso, and micro levels. Second, application scenarios are categorized along a general-to-specific continuum explicitly mirroring these value tiers. Third, value alignment methods and evaluation are mapped to this tiered framework through systematic examination of benchmarking datasets and relevant methodologies. Additionally, we delve into value coordination among multiple agents within agentic AI systems. Finally, we propose several potential research directions in this field.
- Europe > Austria > Vienna (0.15)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
- (21 more...)
- Research Report (1.00)
- Overview (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- (6 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution
Coca, Alexandru, Gaynor, Mark, Zhang, Zhenxing, Cheng, Jianpeng, Tseng, Bo-Hsiang, Boothroyd, Pete, Alonso, Héctor Martinez, Séaghdha, Diarmuid Ó, Johannsen, Anders
This work evaluates the potential of large language models (LLMs) to power digital assistants capable of complex action execution. These assistants rely on pre-trained programming knowledge to execute multi-step goals by composing objects and functions defined in assistant libraries into action execution programs. To achieve this, we develop ASPERA, a framework comprising an assistant library simulation and a human-assisted LLM data generation engine. Our engine allows developers to guide LLM generation of high-quality tasks consisting of complex user queries, simulation state and corresponding validation programs, tackling data availability and evaluation robustness challenges. Alongside the framework we release Asper-Bench, an evaluation dataset of 250 challenging tasks generated using ASPERA, which we use to show that program generation grounded in custom assistant libraries is a significant challenge to LLMs compared to dependency-free code generation.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (21 more...)
- Workflow (1.00)
- Research Report (0.81)
- Instructional Material (0.67)
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
Qiu, Yansheng, Xiao, Li, Xu, Zhaopan, Zhou, Pengfei, Wang, Zheng, Zhang, Kaipeng
The goal of achieving Artificial General Intelligence (AGI) is to imitate humans and surpass them. Models such as OpenAI's o1, o3, and DeepSeek's R1 have demonstrated that large language models (LLMs) with human-like reasoning capabilities exhibit exceptional performance and are being gradually integrated into multimodal large language models (MLLMs). However, whether these models possess capabilities comparable to humans in handling reasoning tasks remains unclear at present. In this paper, we propose Human-Aligned Bench, a benchmark for fine-grained alignment of multimodal reasoning with human performance. Specifically, we collected 9,794 multimodal questions that solely rely on contextual reasoning, including bilingual (Chinese and English) multimodal questions and pure text-based questions, encompassing four question types: visual reasoning, definition judgment, analogical reasoning, and logical judgment. More importantly, each question is accompanied by human success rates and options that humans are prone to choosing incorrectly. Extensive experiments on the Human-Aligned Bench reveal notable differences between the performance of current MLLMs in multimodal reasoning and human performance. The findings on our benchmark provide insights into the development of the next-generation models.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Texas > Starr County (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications
Gan, Chunjing, Yang, Dan, Hu, Binbin, Liu, Ziqi, Shen, Yue, Zhang, Zhiqiang, Wang, Jian, Zhou, Jun
Large language models (LLMs) have become a disruptive force in the industry, introducing unprecedented capabilities in natural language processing, logical reasoning and so on. However, the challenges of knowledge updates and hallucination issues have limited the application of LLMs in medical scenarios, where retrieval-augmented generation (RAG) can offer significant assistance. Nevertheless, existing retrieve-then-read approaches generally digest the retrieved documents, without considering the timeliness, authoritativeness and commonality of retrieval. We argue that these approaches can be suboptimal, especially in real-world applications where information from different sources might conflict with each other and even information from the same source in different time scale might be different, and totally relying on this would deteriorate the performance of RAG approaches. We propose PolyRAG that carefully incorporate judges from different perspectives and finally integrate the polyviews for retrieval augmented generation in medical applications. Due to the scarcity of real-world benchmarks for evaluation, to bridge the gap we propose PolyEVAL, a benchmark consists of queries and documents collected from real-world medical scenarios (including medical policy, hospital & doctor inquiry and healthcare) with multiple tagging (e.g., timeliness, authoritativeness) on them. Extensive experiments and analysis on PolyEVAL have demonstrated the superiority of PolyRAG.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China (0.04)
- North America > United States > Texas > Starr County (0.04)
- Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)
- Workflow (0.46)
- Research Report (0.40)